Data was published in a paper by Domanska et al. 2021 and the Frode Jahnsen lab. It was accessible by EGAD00001007765. Data was aligned by Brian Gloss.
Digestion
Resected colonic tissues were processed within 2 h after removal from the patient. Single-cell suspensions of colonic resections were obtained using a modified version of a previously published protocol (Bujko et al., 2018). The intestinal specimens were opened longitudinally and washed in Dulbecco’s PBS. The muscularis propria was first removed with scissors, after which the mucosa was dissected in narrow strips. The mucosal fragments were then incubated with shaking in PBS with 2 mM EDTA (Sigma-Aldrich) and 1% FCS (Sigma-Aldrich) three times for 15 min at 37°C. The remaining tissue was minced and digested with stirring for 60 min in complete RPMI (RPMI 1640 [Lonza] supplemented with 10% FCS, 1% penicillin/streptomycin [Lonza] and containing 0.25 mg/ml Liberase TL [Roche] and 20 U/ml DNase I [Sigma-Aldrich]). Digested cell suspension was passed through a 100-µm filter and washed.
scRNA
Cellular suspensions (∼15,000 cells, with expected recovery of ∼7,500
cells) of sorted CD45+HLA-DR+CD14+ macrophages from colonic mucosa and
muscularis propria were loaded on the 10X Chromium Controller instrument
(10X Genomics) according to the manufacturer’s protocol using 10X
GEMCode proprietary technology. All samples from individual patients
were loaded in one batch. The Chromium Single Cell 3′ v2 Reagent kit
(10X Genomics) was used to generate the cDNA and prepare the libraries,
according to the manufacturer’s protocol. The libraries were then
equimolarly pooled and sequenced on an Illumina NextSeq500 using
HighOutput flow cells v2.5. A coverage of 400 million reads per sample
was targeted to obtain 50,000 reads per cell. The raw data were then
demultiplexed and processed with the Cell Ranger software (10X Genomics)
v2.1.1.
Data processing In total, we analyzed 63,917 human cells from donors (n = 4). We aligned the reads of the input dataset to the GRCh38 reference genomes and estimated cell-containing partitions and associated unique molecular identifiers using the Cell Ranger Chromium Single Cell RNA-seq v3.0.2. We performed data preparation using Seurat R packages. Genes expressed in fewer than three cells in a sample were excluded, as well as cells that expressed fewer than 200 genes and mitochondrial gene content >5% of the total unique molecular identifier count. We normalized data by using gene counts for each cell that were divided by the total counts for that cell and multiplied by 10,000 and then log-transformed. Subsequently, we identified genes that were outliers on a mean variability plot using the vst method with 2,000 genes. For mucosa and muscularis data, we separately found integration anchors and then performed data integration using a precomputed anchorSet with default parameters. Finally, we scaled data and centered genes in the dataset using linear model.
setwd("/Users/thomasoneil/Library/CloudStorage/OneDrive-TheUniversityofSydney(Staff)/projectz/EGAD00001007765_bujko_singlecell")
if(!dir.exists("raw")){dir.create("raw")}
if(!dir.exists("data")){dir.create("data")}
if(!dir.exists("plots")){dir.create("plots")}
if(!dir.exists("plots/qc")){dir.create("plots/qc")}
meta <- read.csv(paste0("raw/",list.files("raw")[9]))
dat_list <- list()
Here I’ll show you how I add percent.mt as a metric for measuring mitochondrial content.
I’ll also recreate the data processing and filtering, such as cells that don’t have more than 200 unique genes and less than 5% mito.
First I load in the data with cells that have at least 3 genes
present and genes that are expressed in at least 3 cells. This reduces a
lot of empty cell reads. For example, without the intial filter, you’re
working with 38,606 genes and 1,240,621 cells.
After intial filters, you are only working with 25,654
genes and 112,253 cells. You’ll see that this gets cut down
to about 2000 cells after filtering, so dont worry about
losing too much data in this first sample.
i=1
dat <- CreateSeuratObject(Read10X(paste0("raw/", meta$file_accession_id[i], "/raw_feature_bc_matrix")),
project = meta$file_accession_id[i],
min.cells= 3,
min.features=3)
dat$percent.mt <- PercentageFeatureSet(dat, pattern="^MT-")
Using this, we can visualise the cut offs we choose, such as nFeature_RNA <200 and percent.mt >5
p1 = FeatureScatter(dat, "nFeature_RNA", "nCount_RNA", pt.size=2) + geom_vline(xintercept = 200) + NoLegend()
## Rasterizing points since number of points exceeds 100,000.
## To disable this behavior set `raster=FALSE`
p2 = VlnPlot(dat, "percent.mt", pt.size=0.1, raster=F)+geom_hline(yintercept = 5)+NoLegend()+ylab("Percent.mt")+ggtitle("")+theme(axis.text.x = element_text(size=0))+xlab("")
plot_grid(p1,p2, rel_widths = c(3,2))
We can also visualise the distribution of the poor QC and mitochondrial high cells and what exactly you’re cutting out.
dat<-qcMe(dat, nfeat=200, percent.mito = 5, return.seurat=T)
## Percentage of cells that pass QC:2.01
cutoff = 10
passcutoff = rowSums(dat[["RNA"]]$counts!=0)>cutoff
data.frame(Features=Features(dat), Counts = rowSums(dat[["RNA"]]$counts!=0)) %>%
ggplot(aes(Counts)) + geom_histogram(aes(fill='red'), bins=500)+theme_pubclean()+NoLegend()+
geom_vline(xintercept = cutoff, alpha=0.1)+
xlim(0,1000)+
labs(subtitle=paste0("Cut off set: ",
cutoff,
".\n",
round(100*sum(passcutoff)/nrow(dat),2)
, "% (", sum(passcutoff), " genes pass cut off)"))
dat <- subset(dat, subset = filt!="throw", features = Features(dat)[rowSums(dat[["RNA"]]$counts !=0)>cutoff])
dat$tis = meta$sample_title[i]
dat$sample = meta$sample_alias[i]
dat$accession = meta$file_accession_id[i]
dat_list[i] <- dat
The remaining samples will be processed here with the same cutoff metrics.
i=2
dat <- CreateSeuratObject(Read10X(paste0("raw/", meta$file_accession_id[i], "/raw_feature_bc_matrix")),
project = meta$file_accession_id[i],
min.cells= 3,
min.features=3)
dat$percent.mt <- PercentageFeatureSet(dat, pattern="^MT-")
Using this, we can visualise the cut offs we choose, such as nFeature_RNA <200 and percent.mt >5
p1 = FeatureScatter(dat, "nFeature_RNA", "nCount_RNA", pt.size=2) + geom_vline(xintercept = 200) + NoLegend()
## Rasterizing points since number of points exceeds 100,000.
## To disable this behavior set `raster=FALSE`
p2 = VlnPlot(dat, "percent.mt", pt.size=0.1, raster=F)+geom_hline(yintercept = 5)+NoLegend()+ylab("Percent.mt")+ggtitle("")+theme(axis.text.x = element_text(size=0))+xlab("")
plot_grid(p1,p2, rel_widths = c(3,2))
We can also visualise the distribution of the poor QC and mitochondrial high cells and what exactly you’re cutting out.
dat<-qcMe(dat, nfeat=200, percent.mito = 5, return.seurat=T)
## Percentage of cells that pass QC:2.85
cutoff = 10
passcutoff = rowSums(dat[["RNA"]]$counts!=0)>cutoff
data.frame(Features=Features(dat), Counts = rowSums(dat[["RNA"]]$counts!=0)) %>%
ggplot(aes(Counts)) + geom_histogram(aes(fill='red'), bins=500)+theme_pubclean()+NoLegend()+
geom_vline(xintercept = cutoff, alpha=0.1)+
xlim(0,1000)+
labs(subtitle=paste0("Cut off set: ",
cutoff,
".\n",
round(100*sum(passcutoff)/nrow(dat),2)
, "% (", sum(passcutoff), " genes pass cut off)"))
dat <- subset(dat, subset = filt!="throw", features = Features(dat)[rowSums(dat[["RNA"]]$counts !=0)>cutoff])
dat$tis = meta$sample_title[i]
dat$sample = meta$sample_alias[i]
dat$accession = meta$file_accession_id[i]
dat_list[i] <- dat
i=3
dat <- CreateSeuratObject(Read10X(paste0("raw/", meta$file_accession_id[i], "/raw_feature_bc_matrix")),
project = meta$file_accession_id[i],
min.cells= 3,
min.features=3)
dat$percent.mt <- PercentageFeatureSet(dat, pattern="^MT-")
Using this, we can visualise the cut offs we choose, such as nFeature_RNA <200 and percent.mt >5
p1 = FeatureScatter(dat, "nFeature_RNA", "nCount_RNA", pt.size=2) + geom_vline(xintercept = 200) + NoLegend()
## Rasterizing points since number of points exceeds 100,000.
## To disable this behavior set `raster=FALSE`
p2 = VlnPlot(dat, "percent.mt", pt.size=0.1, raster=F)+geom_hline(yintercept = 5)+NoLegend()+ylab("Percent.mt")+ggtitle("")+theme(axis.text.x = element_text(size=0))+xlab("")
plot_grid(p1,p2, rel_widths = c(3,2))
We can also visualise the distribution of the poor QC and mitochondrial high cells and what exactly you’re cutting out.
dat<-qcMe(dat, nfeat=200, percent.mito = 5, return.seurat=T)
## Percentage of cells that pass QC:2.2
cutoff = 10
passcutoff = rowSums(dat[["RNA"]]$counts!=0)>cutoff
data.frame(Features=Features(dat), Counts = rowSums(dat[["RNA"]]$counts!=0)) %>%
ggplot(aes(Counts)) + geom_histogram(aes(fill='red'), bins=500)+theme_pubclean()+NoLegend()+
geom_vline(xintercept = cutoff, alpha=0.1)+
xlim(0,1000)+
labs(subtitle=paste0("Cut off set: ",
cutoff,
".\n",
round(100*sum(passcutoff)/nrow(dat),2)
, "% (", sum(passcutoff), " genes pass cut off)"))
dat <- subset(dat, subset = filt!="throw", features = Features(dat)[rowSums(dat[["RNA"]]$counts !=0)>cutoff])
dat$tis = meta$sample_title[i]
dat$sample = meta$sample_alias[i]
dat$accession = meta$file_accession_id[i]
dat_list[i] <- dat
i=4
dat <- CreateSeuratObject(Read10X(paste0("raw/", meta$file_accession_id[i], "/raw_feature_bc_matrix")),
project = meta$file_accession_id[i],
min.cells= 3,
min.features=3)
dat$percent.mt <- PercentageFeatureSet(dat, pattern="^MT-")
Using this, we can visualise the cut offs we choose, such as nFeature_RNA <200 and percent.mt >5
p1 = FeatureScatter(dat, "nFeature_RNA", "nCount_RNA", pt.size=2) + geom_vline(xintercept = 200) + NoLegend()
## Rasterizing points since number of points exceeds 100,000.
## To disable this behavior set `raster=FALSE`
p2 = VlnPlot(dat, "percent.mt", pt.size=0.1, raster=F)+geom_hline(yintercept = 5)+NoLegend()+ylab("Percent.mt")+ggtitle("")+theme(axis.text.x = element_text(size=0))+xlab("")
plot_grid(p1,p2, rel_widths = c(3,2))
We can also visualise the distribution of the poor QC and mitochondrial high cells and what exactly you’re cutting out.
dat<-qcMe(dat, nfeat=200, percent.mito = 5, return.seurat=T)
## Percentage of cells that pass QC:3.47
cutoff = 10
passcutoff = rowSums(dat[["RNA"]]$counts!=0)>cutoff
data.frame(Features=Features(dat), Counts = rowSums(dat[["RNA"]]$counts!=0)) %>%
ggplot(aes(Counts)) + geom_histogram(aes(fill='red'), bins=500)+theme_pubclean()+NoLegend()+
geom_vline(xintercept = cutoff, alpha=0.1)+
xlim(0,1000)+
labs(subtitle=paste0("Cut off set: ",
cutoff,
".\n",
round(100*sum(passcutoff)/nrow(dat),2)
, "% (", sum(passcutoff), " genes pass cut off)"))
dat <- subset(dat, subset = filt!="throw", features = Features(dat)[rowSums(dat[["RNA"]]$counts !=0)>cutoff])
dat$tis = meta$sample_title[i]
dat$sample = meta$sample_alias[i]
dat$accession = meta$file_accession_id[i]
dat_list[i] <- dat
i=5
dat <- CreateSeuratObject(Read10X(paste0("raw/", meta$file_accession_id[i], "/raw_feature_bc_matrix")),
project = meta$file_accession_id[i],
min.cells= 3,
min.features=3)
dat$percent.mt <- PercentageFeatureSet(dat, pattern="^MT-")
Using this, we can visualise the cut offs we choose, such as nFeature_RNA <200 and percent.mt >5
p1 = FeatureScatter(dat, "nFeature_RNA", "nCount_RNA", pt.size=2) + geom_vline(xintercept = 200) + NoLegend()
## Rasterizing points since number of points exceeds 100,000.
## To disable this behavior set `raster=FALSE`
p2 = VlnPlot(dat, "percent.mt", pt.size=0.1, raster=F)+geom_hline(yintercept = 5)+NoLegend()+ylab("Percent.mt")+ggtitle("")+theme(axis.text.x = element_text(size=0))+xlab("")
plot_grid(p1,p2, rel_widths = c(3,2))
We can also visualise the distribution of the poor QC and mitochondrial high cells and what exactly you’re cutting out.
dat<-qcMe(dat, nfeat=200, percent.mito = 5, return.seurat=T)
## Percentage of cells that pass QC:5.13
cutoff = 10
passcutoff = rowSums(dat[["RNA"]]$counts!=0)>cutoff
data.frame(Features=Features(dat), Counts = rowSums(dat[["RNA"]]$counts!=0)) %>%
ggplot(aes(Counts)) + geom_histogram(aes(fill='red'), bins=500)+theme_pubclean()+NoLegend()+
geom_vline(xintercept = cutoff, alpha=0.1)+
xlim(0,1000)+
labs(subtitle=paste0("Cut off set: ",
cutoff,
".\n",
round(100*sum(passcutoff)/nrow(dat),2)
, "% (", sum(passcutoff), " genes pass cut off)"))
dat <- subset(dat, subset = filt!="throw", features = Features(dat)[rowSums(dat[["RNA"]]$counts !=0)>cutoff])
dat$tis = meta$sample_title[i]
dat$sample = meta$sample_alias[i]
dat$accession = meta$file_accession_id[i]
dat_list[i] <- dat
i=6
dat <- CreateSeuratObject(Read10X(paste0("raw/", meta$file_accession_id[i], "/raw_feature_bc_matrix")),
project = meta$file_accession_id[i],
min.cells= 3,
min.features=3)
dat$percent.mt <- PercentageFeatureSet(dat, pattern="^MT-")
Using this, we can visualise the cut offs we choose, such as nFeature_RNA <200 and percent.mt >5
p1 = FeatureScatter(dat, "nFeature_RNA", "nCount_RNA", pt.size=2) + geom_vline(xintercept = 200) + NoLegend()
p2 = VlnPlot(dat, "percent.mt", pt.size=0.1, raster=F)+geom_hline(yintercept = 5)+NoLegend()+ylab("Percent.mt")+ggtitle("")+theme(axis.text.x = element_text(size=0))+xlab("")
plot_grid(p1,p2, rel_widths = c(3,2))
We can also visualise the distribution of the poor QC and mitochondrial high cells and what exactly you’re cutting out.
dat<-qcMe(dat, nfeat=200, percent.mito = 5, return.seurat=T)
## Percentage of cells that pass QC:3.43
cutoff = 10
passcutoff = rowSums(dat[["RNA"]]$counts!=0)>cutoff
data.frame(Features=Features(dat), Counts = rowSums(dat[["RNA"]]$counts!=0)) %>%
ggplot(aes(Counts)) + geom_histogram(aes(fill='red'), bins=500)+theme_pubclean()+NoLegend()+
geom_vline(xintercept = cutoff, alpha=0.1)+
xlim(0,1000)+
labs(subtitle=paste0("Cut off set: ",
cutoff,
".\n",
round(100*sum(passcutoff)/nrow(dat),2)
, "% (", sum(passcutoff), " genes pass cut off)"))
dat <- subset(dat, subset = filt!="throw", features = Features(dat)[rowSums(dat[["RNA"]]$counts !=0)>cutoff])
dat$tis = meta$sample_title[i]
dat$sample = meta$sample_alias[i]
dat$accession = meta$file_accession_id[i]
dat_list[i] <- dat
i=7
dat <- CreateSeuratObject(Read10X(paste0("raw/", meta$file_accession_id[i], "/raw_feature_bc_matrix")),
project = meta$file_accession_id[i],
min.cells= 3,
min.features=3)
dat$percent.mt <- PercentageFeatureSet(dat, pattern="^MT-")
Using this, we can visualise the cut offs we choose, such as nFeature_RNA <200 and percent.mt >5
p1 = FeatureScatter(dat, "nFeature_RNA", "nCount_RNA", pt.size=2) + geom_vline(xintercept = 200) + NoLegend()
## Rasterizing points since number of points exceeds 100,000.
## To disable this behavior set `raster=FALSE`
p2 = VlnPlot(dat, "percent.mt", pt.size=0.1, raster=F)+geom_hline(yintercept = 5)+NoLegend()+ylab("Percent.mt")+ggtitle("")+theme(axis.text.x = element_text(size=0))+xlab("")
plot_grid(p1,p2, rel_widths = c(3,2))
We can also visualise the distribution of the poor QC and mitochondrial high cells and what exactly you’re cutting out.
dat<-qcMe(dat, nfeat=200, percent.mito = 5, return.seurat=T)
## Percentage of cells that pass QC:2.59
cutoff = 10
passcutoff = rowSums(dat[["RNA"]]$counts!=0)>cutoff
data.frame(Features=Features(dat), Counts = rowSums(dat[["RNA"]]$counts!=0)) %>%
ggplot(aes(Counts)) + geom_histogram(aes(fill='red'), bins=500)+theme_pubclean()+NoLegend()+
geom_vline(xintercept = cutoff, alpha=0.1)+
xlim(0,1000)+
labs(subtitle=paste0("Cut off set: ",
cutoff,
".\n",
round(100*sum(passcutoff)/nrow(dat),2)
, "% (", sum(passcutoff), " genes pass cut off)"))
dat <- subset(dat, subset = filt!="throw", features = Features(dat)[rowSums(dat[["RNA"]]$counts !=0)>cutoff])
dat$tis = meta$sample_title[i]
dat$sample = meta$sample_alias[i]
dat$accession = meta$file_accession_id[i]
dat_list[i] <- dat
i=8
dat <- CreateSeuratObject(Read10X(paste0("raw/", meta$file_accession_id[i], "/raw_feature_bc_matrix")),
project = meta$file_accession_id[i],
min.cells= 3,
min.features=3)
dat$percent.mt <- PercentageFeatureSet(dat, pattern="^MT-")
Using this, we can visualise the cut offs we choose, such as nFeature_RNA <200 and percent.mt >5
p1 = FeatureScatter(dat, "nFeature_RNA", "nCount_RNA", pt.size=2) + geom_vline(xintercept = 200) + NoLegend()
## Rasterizing points since number of points exceeds 100,000.
## To disable this behavior set `raster=FALSE`
p2 = VlnPlot(dat, "percent.mt", pt.size=0.1, raster=F)+geom_hline(yintercept = 5)+NoLegend()+ylab("Percent.mt")+ggtitle("")+theme(axis.text.x = element_text(size=0))+xlab("")
plot_grid(p1,p2, rel_widths = c(3,2))
We can also visualise the distribution of the poor QC and mitochondrial high cells and what exactly you’re cutting out.
dat<-qcMe(dat, nfeat=200, percent.mito = 5, return.seurat=T)
## Percentage of cells that pass QC:5.21
cutoff = 10
passcutoff = rowSums(dat[["RNA"]]$counts!=0)>cutoff
data.frame(Features=Features(dat), Counts = rowSums(dat[["RNA"]]$counts!=0)) %>%
ggplot(aes(Counts)) + geom_histogram(aes(fill='red'), bins=500)+theme_pubclean()+NoLegend()+
geom_vline(xintercept = cutoff, alpha=0.1)+
xlim(0,1000)+
labs(subtitle=paste0("Cut off set: ",
cutoff,
".\n",
round(100*sum(passcutoff)/nrow(dat),2)
, "% (", sum(passcutoff), " genes pass cut off)"))
dat <- subset(dat, subset = filt!="throw", features = Features(dat)[rowSums(dat[["RNA"]]$counts !=0)>cutoff])
dat$tis = meta$sample_title[i]
dat$sample = meta$sample_alias[i]
dat$accession = meta$file_accession_id[i]
dat_list[i] <- dat
Here I’ll show the integrated data
dat <- merge(dat_list[[1]], dat_list[[2]])
dat <- merge(dat, dat_list[[3]])
dat <- merge(dat, dat_list[[4]])
dat <- merge(dat, dat_list[[5]])
dat <- merge(dat, dat_list[[6]])
dat <- merge(dat, dat_list[[7]])
dat <- merge(dat, dat_list[[8]])
JoinLayers(dat)
## An object of class Seurat
## 24332 features across 35387 samples within 1 assay
## Active assay: RNA (24332 features, 0 variable features)
## 1 layer present: counts
#dat[["RNA"]] <- split(dat[["RNA"]], f = dat$sample)
dat <- NormalizeData(dat,verbose=F)
dat <- FindVariableFeatures(dat,verbose=F)
dat <- ScaleData(dat,verbose=F)
dat <- RunPCA(dat,verbose=F)
dat <- IntegrateLayers(object = dat, method = RPCAIntegration, orig.reduction = "pca", new.reduction = "integrated.rpca")
## Computing within dataset neighborhoods
## Finding all pairwise anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1339 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 976 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1097 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1025 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 2246 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1210 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1188 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1320 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 3467 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1508 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 900 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 2897 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 930 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1959 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1324 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1700 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1793 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1488 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1725 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 2216 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1396 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1083 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 3466 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1291 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 2534 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1702 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 3029 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 2488 anchors
## Merging dataset 6 into 8
## Extracting anchors for merged samples
## Finding integration vectors
## Finding integration vector weights
## Integrating data
## Merging dataset 3 into 5
## Extracting anchors for merged samples
## Finding integration vectors
## Finding integration vector weights
## Integrating data
## Merging dataset 2 into 8 6
## Extracting anchors for merged samples
## Finding integration vectors
## Finding integration vector weights
## Integrating data
## Merging dataset 1 into 7
## Extracting anchors for merged samples
## Finding integration vectors
## Finding integration vector weights
## Integrating data
## Merging dataset 4 into 8 6 2
## Extracting anchors for merged samples
## Finding integration vectors
## Finding integration vector weights
## Integrating data
## Merging dataset 7 1 into 5 3
## Extracting anchors for merged samples
## Finding integration vectors
## Finding integration vector weights
## Integrating data
## Merging dataset 5 3 7 1 into 8 6 2 4
## Extracting anchors for merged samples
## Finding integration vectors
## Finding integration vector weights
## Integrating data
dat <- RunUMAP(dat, reduction = "integrated.rpca", dims=1:30)
## 09:42:44 UMAP embedding parameters a = 0.9922 b = 1.112
## 09:42:44 Read 35387 rows and found 30 numeric columns
## 09:42:44 Using Annoy for neighbor search, n_neighbors = 30
## 09:42:44 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 09:42:45 Writing NN index file to temp file /var/folders/h5/tq0dp1p95rsfkwyjmhp757hr0000gn/T//Rtmp2t5RKL/file6660178c82c6
## 09:42:45 Searching Annoy index using 1 thread, search_k = 3000
## 09:42:51 Annoy recall = 100%
## 09:42:52 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 09:42:52 Initializing from normalized Laplacian + noise (using RSpectra)
## 09:42:53 Commencing optimization for 200 epochs, with 1607182 positive edges
## 09:43:03 Optimization finished
dat <- FindNeighbors(dat, dims=1:30, reduction="integrated.rpca")
## Computing nearest neighbor graph
## Computing SNN
dat <- FindClusters(dat, resolution=0.2, reduction="integrated.rpca")
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 35387
## Number of edges: 1327082
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9202
## Number of communities: 11
## Elapsed time: 4 seconds
UMAPPlot(dat, label=T, pt.size=2, label.box=T)
dat <- JoinLayers(dat)
As this is the first time, this will just be a test of this new function.
Here I have set up a few reference datasets to derive reference annotations just as a suggestion for labelling
If these reference data are used, you will see in the data metadata
columns labelled as GCA_broad_subset, or specific cells,
such as `GCA_MNP_
For this intiial test, I’m using the paediatric data.
ref <- readRDS("/Users/thomasoneil/Library/CloudStorage/OneDrive-TheUniversityofSydney(Staff)/projectz/ref_temp/paediatricdata_subset.rds")
ref$annot <- ref$category
predictions <- TransferData(
anchorset = FindTransferAnchors(reference = ref, query = dat, features=VariableFeatures(ref)),
refdata = ref$annot
)
## Performing PCA on the provided reference using 1702 features as input.
## Projecting cell embeddings
## Finding neighborhoods
## Finding anchors
## Found 2400 anchors
## Finding integration vectors
## Finding integration vector weights
## Predicting cell labels
colnames(predictions) = c("GCA_broad_IDs", paste0("GCA_broad_",sapply(str_split(colnames(predictions)[2:c(ncol(predictions)-1)], "\\."), function(x) x[3])), "Max")
predictions=predictions[,-ncol(predictions)]
dat <- AddMetaData(dat, metadata=predictions)
plot_grid(proportions(dat, "ident", "GCA_broad_IDs", "fill"),UMAPPlot(dat, label=T, label.box=T, pt.size=2)+NoLegend()+NoAxes(), rel_widths = c(2,1))
## `summarise()` has grouped output by 'ident.2'. You can override using the
## `.groups` argument.
dat <- RenameIdents(dat,
"0"="Mo1",
"1"="Mo2",
"2"="Mo3",
"4"="Mo4",
"8"="Cycling",
"5"="Plasma",
"9"="Lymphocytes",
"6"="Non-Imm",
"10"="Mixed/Non-Imm",
"3"="LowQ",
"7"="LowQ"
)
DotPlot(dat, features=c("CD14", "LYZ", "S100A8", "CD163",
"ITGAE", "MKI67", "TYMS", "CD79A", "MZB1", "CD3E", "COL1A1", "DSG2", "AOC1"))+RotatedAxis()
dat$init <- Idents(dat)
saveRDS(dat, "data/data_qc_int.rds")
Mo1-4 subset and re-integrated.
dat <- subset(dat, idents= c("Mo1", "Mo2", "Mo3", "Mo4"))
JoinLayers(dat)
## An object of class Seurat
## 24332 features across 29947 samples within 1 assay
## Active assay: RNA (24332 features, 2000 variable features)
## 3 layers present: data, counts, scale.data
## 3 dimensional reductions calculated: pca, integrated.rpca, umap
dat[["RNA"]] <- split(dat[["RNA"]], f = dat$sample)
## Splitting 'counts', 'data' layers. Not splitting 'scale.data'. If you would like to split other layers, set in `layers` argument.
dat <- NormalizeData(dat,verbose=F)
dat <- FindVariableFeatures(dat,verbose=F)
dat <- ScaleData(dat,verbose=F)
dat <- RunPCA(dat,verbose=F)
dat <- IntegrateLayers(object = dat, method = RPCAIntegration, orig.reduction = "pca", new.reduction = "integrated.rpca")
## Computing within dataset neighborhoods
## Finding all pairwise anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 988 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 844 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 953 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 920 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1938 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1106 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1013 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1142 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 3440 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1433 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 825 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 2789 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 887 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1760 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1105 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1462 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1616 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1424 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1532 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 2022 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1326 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1014 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 3078 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1298 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 2299 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 1616 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 2655 anchors
## Projecting new data onto SVD
## Projecting new data onto SVD
## Finding neighborhoods
## Finding anchors
## Found 2343 anchors
## Merging dataset 1 into 7
## Extracting anchors for merged samples
## Finding integration vectors
## Finding integration vector weights
## Integrating data
## Merging dataset 3 into 5
## Extracting anchors for merged samples
## Finding integration vectors
## Finding integration vector weights
## Integrating data
## Merging dataset 6 into 2
## Extracting anchors for merged samples
## Finding integration vectors
## Finding integration vector weights
## Integrating data
## Merging dataset 2 6 into 8
## Extracting anchors for merged samples
## Finding integration vectors
## Finding integration vector weights
## Integrating data
## Merging dataset 4 into 8 2 6
## Extracting anchors for merged samples
## Finding integration vectors
## Finding integration vector weights
## Integrating data
## Merging dataset 7 1 into 5 3
## Extracting anchors for merged samples
## Finding integration vectors
## Finding integration vector weights
## Integrating data
## Merging dataset 5 3 7 1 into 8 2 6 4
## Extracting anchors for merged samples
## Finding integration vectors
## Finding integration vector weights
## Integrating data
dat <- RunUMAP(dat, reduction = "integrated.rpca", dims=1:30)
## 09:45:39 UMAP embedding parameters a = 0.9922 b = 1.112
## 09:45:39 Read 29947 rows and found 30 numeric columns
## 09:45:39 Using Annoy for neighbor search, n_neighbors = 30
## 09:45:39 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 09:45:40 Writing NN index file to temp file /var/folders/h5/tq0dp1p95rsfkwyjmhp757hr0000gn/T//Rtmp2t5RKL/file66605d1d4189
## 09:45:40 Searching Annoy index using 1 thread, search_k = 3000
## 09:45:45 Annoy recall = 100%
## 09:45:46 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 09:45:46 Initializing from normalized Laplacian + noise (using RSpectra)
## 09:45:47 Commencing optimization for 200 epochs, with 1338986 positive edges
## 09:45:55 Optimization finished
dat <- FindNeighbors(dat, dims=1:30, reduction="integrated.rpca")
## Computing nearest neighbor graph
## Computing SNN
dat <- FindClusters(dat, resolution=0.2, reduction="integrated.rpca")
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 29947
## Number of edges: 1086090
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9025
## Number of communities: 5
## Elapsed time: 4 seconds
dat <- JoinLayers(dat)
dat <- RenameIdents(dat,
"0" ="Mo1",
"1" ="Mo2",
"2" ="Mo3",
"3" ="Mo4",
"4" ="Mo5"
)
UMAPPlot(dat, label=T, pt.size=2, label.box=T)
saveRDS(dat, "data/Mo_int.rds")